Data Science on Blockchain, how to get started?

Two analysis examples on NFT and Helium

Thomas de Marchin

14DEC2022

Introduction

  • Associate Director, Statistics & Data Science at Pharmalex: support pharmaceutical companies in drug development
  • Gravitated towards blockchain technology and the place of data science within it
  • Lots of data available on the blockchain (~ 1x10^6 transaction per day on ethereum), how to access it?
  • Obtaining cryptocurrencies price data is straightforward, more sophisticated data manipulations require access to the “source” data
  • Github: https://github.com/tdemarchin
  • Blog: https://tdemarchin.medium.com/

Blockchain is accessible but hard to read

  1. Several Tb of data

  2. Data are stored sequentially, requires developing specific tools to follow a transaction.

  3. The structure of a transaction is difficult to read

  4. Fragmentation of blockchain technologies

How to get the data ?

  1. Set-up an ETL: Extract, Transform and Load. Most flexible but need to set up a server with huge and fast hard-drives.
  2. Use data providers:
  • Dashboards: Dune analytics, Nansen, GraphSense, icy.tools,…
  • APIs (software intermediary that allows two applications to talk to each other): OpenSea, Etherscan, Infura, The Graph,…

Two examples implemented in R:

  1. How to track and visualize NFTs: OpenSea and EtherScan APIs
  2. Blockchain IoT data visualization and the rise of the Helium network: full data dump from The Helium Foundation ETL

NFTs

  • Non-Fungible Tokens: represent ownership of unique items (art, collectibles, patents, real estate,…)
  • Smart contracts: decentralized programs stored on a blockchain that run when predetermined conditions are met
  • What can we do with NFT-related data ?

NFTs price analysis with OpenSea API

OpenSea: big NFT market place

resOpenSea <- GET("https://api.opensea.io/api/v1/events",
          query = list(limit = 300, 
                       event_type = "successful", 
                       only_opensea = "true")) 

  • 300 transactions max, limited to OpenSea transactions
  • pre-processed by OpenSea and not raw blockchain transactions
  • mainly price analysis
  • Analysis done in June 2021: Link

NFTs tracking with EtherScan API: the transfers

  • Weird Whales: collection of 3350 whales programmatically generated, each with their unique characteristics and traits. Created by a 12-year-old programmer named Benyamin Ahmed who made the buzz.
  • EtherScan: block explorer to view information about transactions, verify contract code, visualize network data \(\rightarrow\) access to more raw data
  • Analysis done in October 2021 and updated in December 2022: Link
  • Weird Whales are managed by a specific smart contract on the Ethereum blockchain

  • To make it easier to extract information from the blockchain, we can read the events: dispatched signals (easy to read) the smart contracts can fire.

resEventTransfer <- GET("https://api.etherscan.io/api",
                          query = list(module = "logs", 
                                       action = "getLogs", 
                                       fromBlock = fromBlock, 
                                       toBlock = "latest",
                                       address = "0x96ed81c7f4406eff359e27bff6325dc3c9e042bd", 
                                       topic0 = "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
                                       apikey = EtherScanAPIToken)) 

Where is the sales price? On OpenSea, sales are managed by the main contract and if approved, the second contract is called (here Weird Whales), which then triggers the transfer \(\rightarrow\) need to download all the transactions from the OpenSea main smart contract address and then filter for the ones related to Weird Whales (~ 10000 API calls, can take several hours…).

  • Ethereum / USD rate is highly volatile, need to obtain the historical ETH market price if we want to convert ETH to USD.
  • A spline is fitted on data from the CoinGecko API

NFTs tracking with EtherScan API: analysis and visualisation

  • High variability in prices at the begining (buzz), followed by a quieter period.
  • Starting March 2022, all sales price felt to 0 \(\rightarrow\) OpenSea was hacked in March, they probably updated their smart contract…

  • Networks are described by:

    • vertices (or nodes): the wallet addresses
    • edges (or links): the transactions
  • Each color represents a unique token ID

  • Made with the network and ggraph packages

About 2/3 of the transactions happened very shortly after the NFT’s creation

  • Made with the network and networkDynamic packages

Helium IoT network

Helium is a decentralized wireless infrastructure for IoT devices (environmental sensors, localisation sensors to track bike fleets,…). It is a blockchain that leverages a decentralized global network of Hotspots. People are incentivized to install hotspots and become a part of the network by earning Helium tokens, which can be bought and sold like any other cryptocurrency.

  • How big is the Helium network?
  • Where are located the hotspots?
  • Are they actively utilized, i.e. are they used to transfer data with connected devices?

References

  • https://weirdwhalesnft.com/
  • Ig.com